Natural language search results for intent queries

ABSTRACT

Systems and methods provide natural language search results to clear-intent queries. To provide the natural language search results, a system may parse a document from an authoritative source to generate at least one heading-text pair, the text appearing under the heading in the document. The system may assign a topic and a question category to the heading-text pair and store the heading-text pair in a data store keyed by the topic and the question category. The system determines that a query corresponds to the topic and the question category, and provides the heading-text pair as a natural language search result for the query. In some implementations, the text portion of the heading-text pair may be a paragraph or a list of items and the natural language search result may be provided with conventional snippet-based search results in response to the query.

RELATED APPLICATION

This application is a continuation of, and claims priority to, U.S.application Ser. No. 13/910,031, filed Jun. 4, 2013, entitled “NATURALLANGUAGE SEARCH RESULTS FOR INTENT QUERIES,” the disclosure of which isincorporated herein by reference.

BACKGROUND

Search engines are a popular method of discovering information.Traditionally, search engines crawl documents in a corpus, generate aninverted index for the documents, and use the index to determine whichdocuments are responsive to a search query. Search results commonlyinclude a title from a responsive document and a snippet of text fromthe document that includes one or more of the search terms in the query.Such snippets are not natural language results and typically fail toprovide a complete, easily understood answer to non-factual questionswhere there is no one correct answer. While a user can select the linkassociated with the snippet to view the context of the snippet in theoriginal document to determine whether the identified information isadequate, this slows the user experience and involves additional efforton the part of the user to receive an answer to a non-factual question.

SUMMARY

Some implementations enable a search system to provide enhanced searchresults to natural language and non-factual queries. The search systemmay enable a query requestor to receive relevant answers in an intuitiveformat without having to load and read the original document source. Anatural language query is a query using terms a person would use to aska question, such as “how do I make hummus?” Some natural languagequeries are non-factual. A non-factual query may be a query thatincludes a request for specific information about a topic. The specificinformation is considered the question category and can have the sameformat for questions directed to various topics. For example, in acooking context a query requestor may have a question about makingtoffee. Recipe instructions are the specific information requested forthe topic of toffee and the instructions may include diverse or complexinformation. In some implementations, the search system may performoffline processing of authoritative sources to determine and storeanswers to common clear-intent non-factual questions. The search systemmay identify clear-intent queries and match the queries to the storedanswers and provide an enhanced search result with complete answers fromone or more authoritative sources.

One aspect of the disclosure can be embodied in a computer-implementedmethod that includes parsing, using at least one processor, documentsfrom authoritative sources to generate heading-text pairs. For eachheading-text pair, the method also includes, associating, using the atleast one processor, the heading-text pair with a first intent templateof a plurality of intent templates, the first intent template having anassociated question category, determining a topic and a questioncategory for the heading-text pair based on the first intent template,and storing the heading-text pair in a data store keyed by topic andquestion category. The method may also include determining that a querycorresponds to a second intent template of the plurality of intenttemplates, the second intent template having an associated secondquestion category, determining a second topic for the query based on thesecond intent template, retrieving heading-text pairs from the datastore that have a topic and question category key that correspond withthe second topic and the second question category, and providing asearch result for the query, wherein the search result includes at leastone of the retrieved heading-text pairs.

The method can include one or more of the following features. Forexample, the second intent template can include one non-variable portionand one variable portion. In such an implementation, corresponding thequery to the second intent template can include determining that thequery includes a first term that corresponds to the one non-variableportion, determining that a second term in the query aligns with thevariable portion, and determining that the second term in the querycorresponds to a topic in the data store. As another example,corresponding the query to the second intent template can includegenerating potential templates from terms of the query and determiningwhether one of the potential templates corresponds to the second intenttemplate. In some implementations associating the heading-text pair withthe first intent template includes determining that text of the headingcorresponds to a non-variable portion of the first intent template andthe topic is derived from text of the heading that corresponds with avariable portion of the first intent template.

In some implementations, the method can further include generating theplurality of intent templates by obtaining intent questions fromauthoritative sources, generating potential templates from the intentquestions, determining a frequency of occurrence for each uniquepotential template, selecting a predetermined number of most frequentlyoccurring potential templates, and storing the selected potentialtemplates in a memory as the plurality of intent templates. In suchimplementations, the potential templates are first potential templatesand generating the plurality of intent templates can further includeobtaining second intent questions from search records, generating secondpotential templates from the second intent questions, and including thesecond potential templates with the first potential templates in thedetermining, selecting, and storing. Also in such implementations, eachpotential template may have at least one non-variable portion and avariable portion, the variable portion representing a starting locationof a topic in text that corresponds to the non-variable portion of thepotential template. Accordingly generating the plurality of intenttemplates can include assigning a respective question category to eachselected potential template based on the non-variable portion of theselected potential template.

Another aspect the disclosure can be embodied in a computer system thatincludes at least one processor and memory storing instructions that,when executed by the at least one processor, cause the computer systemto perform operations. The operations include parsing a document from anauthoritative source to generate at least one heading-text pair, thetext appearing under the heading in the document and assigning a topicand a question category to the heading-text pair. The operations alsoinclude storing the heading-text pair in a data store keyed by the topicand the question category, determining that a query corresponds to thetopic and the question category, and providing the heading-text pair asa natural language search result for the query. A text portion of theheading-text pair may be a paragraph or a list of items that appears inthe original document from the authoritative source.

The system can include one or more of the following features. Forexample, the operations may include generating snippet-based searchresults by searching an index of documents for documents responsive tothe query, and providing the snippet-based search results with thenatural language search result. In such an implementation, thesnippet-based results can be ranked using a particular rankingalgorithm, and the heading-text pairs are ranked using the same rankingalgorithm As another example, the operations may also include retrievinga plurality of heading-text pairs from the data store, each heading-textpair being keyed by the topic and the question category, ranking theplurality of heading-text pairs, and selecting a predetermined number ofhighest ranked heading-text pairs for the search result. In such animplementation, the plurality of heading-text pairs may be ranked basedon a length of the text portion of the heading-text pair or on asimilarity of the text portion with text portions of other heading-textpairs in the plurality of heading-text pairs or a combination of these.

As another example, the system may further include memory storing aplurality of intent templates and wherein the heading-text pair isgenerated when the heading conforms to one of the plurality of intenttemplates. In such an implementation, the question category may bedetermined by the intent template the heading conforms to. In someimplementations, generating the heading-text pair includes determining atopic from a context of the heading in the document; and adding thetopic to a heading portion of the heading-text pair.

Another aspect of the disclosure can be embodied in a computer systemthat includes at least one processor and memory storing instructionsthat, when executed by the at least one processor, cause the computersystem to perform operations. The operations may include parsingdocuments from authoritative sources to generate a plurality ofheading-text pairs, generating a set of potential templates from theheading-text pairs, determining a quantity of occurrences for at leastsome of the set of potential templates, and storing potential templateswith highest quantities as intent templates in a memory of the computersystem.

The system can include one or more of the following features. Forexample, converting the heading to potential templates may includereplacing subsets of consecutive terms in the heading with a variableportion. As another example, the set of potential templates is a firstset of potential templates and the operations further includedetermining, using search records, previously issued queries that havesearch results associated with the authoritative sources, generating asecond set of potential templates from the determined queries, andincluding the second set of potential templates with the first set ofpotential templates as part of determining the quantity of occurrences.In some implementations, the operations include assigning a questioncategory to the intent templates, the question category being stored asan attribute of the intent template.

In one implementation, the operations include receiving a naturallanguage query and determining an intent template of the intenttemplates that corresponds to the natural language query, the determinedintent template having an associated question category. The operationsmay further include determining a topic for the natural language queryusing the determined intent template, searching an index of documentsfor documents responsive to the topic and the associated questioncategory, and providing a search result to for the natural languagequery that includes the documents responsive to the topic and theassociated question category.

Another aspect of the disclosure can be embodied on a computer-readablemedium having recorded and embodied thereon instructions that, whenexecuted by a processor of a computer system, cause the computer systemto perform any of the methods disclosed herein.

One or more of the implementations of the subject matter describedherein can be implemented so as to realize one or more of the followingadvantages. As one example, the system may provide natural languageanswers to a query. Natural language answers are answers in a paragraphand/or list format that provide diverse or complex answers or more thanone fact per answer. The natural language answers are of high qualitybecause they are derived from authoritative sources. Also, because theanswers are natural language answers, the query requestor can view andcompare complete answers quickly and effortlessly among two or moreauthoritative sources. Furthermore because the natural language answersprovide diverse or complex answers, the user has increased confidencethat the authoritative source document has a sought-for answer, even ifonly the beginning of the answer is provided in the search result. Insome implementations, the natural language responses may be includedprior to snippet-type search results, making the answers easy andintuitive to locate. In some implementations, a snippet-type searchresult may be removed if duplicative of one of the natural languageresults provided, thus automatically paring down the search resultsprovided to the query requestor.

As another example, natural language queries may have much lower searchvolume, compared to keyword queries. The improved search system canidentify the intent of a natural language query and, thus, provide highquality answers that a conventional search engine may miss or may notrank highly in response to the natural language query. In someimplementations, the search system may convert a natural language queryto a keyword query to improve the quality of snippet-based resultsreturned for the natural language query.

The details of one or more implementations are set forth in theaccompanying drawings and the description below. Other features will beapparent from the description and drawings, and from the claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 illustrates an example system in accordance with someimplementations.

FIG. 2 illustrates an example of a user interface showing enhancedsearch results that include natural language answers, consistent withdisclosed implementations.

FIG. 3 illustrates a flow diagram of an example process for providingsearch results enhanced with natural language answers, consistent withdisclosed implementations.

FIG. 4 illustrates a flow diagram of an example process for generatingintent templates, consistent with disclosed implementations.

FIG. 5 illustrates a flow diagram of an example process for generating aQuestion-And-Answer data store for providing natural language answers,consistent with disclosed implementations.

FIG. 6 illustrates a flow diagram of an example process for using theQuestion-And-Answer data store to provide an answer to a query,consistent with disclosed implementations.

FIG. 7 illustrates a flow diagram of an example process for determiningwhether a query includes a clear-intent question, consistent withdisclosed implementations.

FIG. 8 shows an example of a computer device that can be used toimplement the described techniques.

FIG. 9 shows an example of a distributed computer device that can beused to implement the described techniques.

Like reference symbols in the various drawings indicate like elements.

DETAILED DESCRIPTION

FIG. 1 is a block diagram of a system 100 in accordance with an exampleimplementation. The system 100 may be used to implement a search enginethat provides natural language answers to queries that include aquestion with identifiable intent. The depiction of system 100 in FIG. 1is described as a system capable of searching authoritative sourcesavailable over the Internet to generate a question-and-answer (Q&A) datastore that provides the natural language answers. The system may provideintent templates used to identify queries with clear intent questionsand to identify the natural language answers from the content of theauthoritative sources. Other configurations and applications of thedescribed technology may be used. For example, the system may includeother methods of classifying the text from the authoritative sources oridentifying clear-intent questions. As another example, natural languageanswers may be provided for other corpora, such as intranets, libraries,or other document repositories. In some implementations the naturallanguage answer may replace a snippet-based search result provided forthe corresponding authoritative source.

The search system 100 may receive queries 182 from a client device 180and return search results 184 in response to the queries. Each query 182is a request for information. Query 182 can be, for example, text,audio, images, or scroll commands. The system 100 may include searchengine 116 and Question-And-Answer (Q&A) engine 110. System 100 may be acomputing device that takes the form of a number of different devices,for example a standard server, a group of such servers, or a rack serversystem. In some implementations, Q&A engine 110 and search engine 116may each be a separate computing device, or they may share componentssuch as processors and memories. For example, the Q&A engine 110 and thesearch engine 116 may be implemented in a personal computer, for examplea laptop computer. In some implementations, the Q&A engine 110 and thesearch engine 116 may be distributed systems implemented in a series ofcomputing devices, such as a group of servers. The system 100 may be anexample of computer device 900, as depicted in FIG. 9.

The system 100 may include a Question-And-Answer (Q&A) data store 124.The Q&A data store 124 may include text collected from one or moreauthoritative sources, for example one or more of server 190, that hasbeen assigned a topic and a question category. The text may be stored indata store 124 as it appears in the original document, for example inthe form of one or more paragraphs or a list of items. Accordingly, thetext may represent multiple facts that can be determined from theparagraph(s) or list. The Q&A data store 124 may store the text keyed bytopic and question category. The Q&A data store 124 may also includeother information for the text, such as an identifier of a document thetext appears in, a location, e.g. URL, for the document, metadata forthe document, values and/or signals that assist in ranking the text,etc. The text stored in the Q&A data store 124 may have a headingportion and a text portion. In some implementations the Q&A data store124 may include questions and answers for a variety of subject matter.For example, the Q&A data store 124 may store questions and answers forhealth-related questions, for hobby-related questions, forcooking-related questions, etc. In some implementations the topic andquestion category may be unique to a particular subject matter area. Insome implementations, the Q&A data store 124 may also include anindication of subject matter for each topic and question category.

The search engine 116 may search the Q&A data store 124 in addition toother document corpora in responding to a search request. For example,the search engine 116 may also be capable of searching a corpus ofcrawled documents 120 in addition to the Q&A data store 124. Crawleddocuments 120 may include an index for searching for terms or phraseswithin a corpus of documents. In some implementations the corpus may bedocuments available on the Internet. Documents may include any type offile that stores content, such as sound files, video files, textdocuments, source code, news articles, blogs, web pages, PDF documents,spreadsheets, etc. In some implementations, crawled documents 120 maystore one-dimensional posting lists that include phrases, terms, ordocument properties as posting list values and, for each posting listvalue, identifiers for documents related to the phrase or term. While anindex for crawled documents 120 has been described as using postinglists, the index may have some other known or later developed format. Insome implementations, the search results from crawled documents 120 maybe used to generate intent templates, to determine whether a queryincludes a clear-intent question, to determine a question category for aquery, etc.

The system 100 may also include search records 122. Search records 122may include search logs, aggregated data gathered from queries, or otherdata regarding the search terms and search results of previouslyprocessed queries. Certain data may be treated in one or more waysbefore it is stored or used, so that personally identifiable informationis removed. For example, a user's identity may be treated so that nopersonally identifiable information can be determined for the user, or auser's geographic location may be generalized where location informationis obtained (such as to a city, ZIP code, or state level), so that aparticular location of a user cannot be determined.

In some implementations, the search records 122 may be generated bysearch engine 116 in the normal process of generating search results184. The Q&A data store 124, crawled documents 120, and search records122 are stored on tangible computer-readable storage devices, forinstance disk, flash, cache memory, main memory, or a combination ofthese, configured to store data in a semi-permanent or non-transientform. In some implementations Q&A data store 124, crawled documents 120,and search records 122 may be stored in a combination of variousmemories.

In some implementations, the system 100 may include an indexing engine(not shown) that includes one or more processors configured to executeone or more machine executable instructions or pieces of software,firmware, or a combination thereof to create and maintain Q&A data store124 and/or crawled documents 120, etc. The indexing engine may obtaincontent from, for example, one or more servers 190, and use the contentto maintain crawled documents 120. In some implementations, the servers190 may be web servers, servers on a private network, or other documentsources that are accessible by the indexing engine.

The search engine 116 may include one or more computing devices that usethe Q&A data store 124 and/or crawled documents 120 to determine searchresults 184 for queries 182. Search results from crawled documents 120may be determined, for example, using conventional or other informationretrieval techniques and represent conventional snippet-based results.Search results from the Q&A data store 124 represent natural languageresults. Search engine 116 may include one or more servers that receivequeries 182 from a requestor, such as client 180, and provide searchresults 184 to the requestor. Search results 184 may include snippetinformation from documents responsive to the query and information fromthe Q&A data store 124. For example, the search engine 116 may include aranking engine that identifies documents responsive to the query fromcrawled documents 120, identifies answers in Q&A data store 124responsive to the query and calculates scores for the documents andanswers responsive to the query, for example, using one or more rankingsignals. The ranking engine may rank the documents and answers foundresponsive to the query using the scores.

The system 100 may also include a Q&A engine 110. The Q&A engine 110 mayinclude one or more computing devices that include one or moreprocessors configured to execute machine executable instructions orpieces of software, firmware, or a combination thereof. The Q&A engine110 may share a computing device or devices with the search engine 116,or may operate using one or more separate computing devices. The Q&Aengine 110 may use the Q&A data store 124, the search records 122, andthe crawled documents 120 to generate intent templates 126, to populateand maintain the Q&A data store 124, and to determine if a queryincludes a clear-intent question that can be answered by the Q&A datastore 124. For example, the search engine 116 may send a query to theQ&A engine 110 and the Q&A engine 110 may provide natural languageanswers from Q&A data store 124 to the search engine 116, asappropriate. The natural language answers may be ranked by the Q&Aengine 110 or by the search engine 116 using data provided by the Q&Aengine.

The Q&A engine 110 may populate and maintain Q&A data store 124 bydetermining heading-text pairs found in documents from the authoritativesources. An authoritative source may be a source that is identified by asystem administrator as authoritative, a source that is popular andtrusted, for example as determined by frequent selection of the sourcein search results, or a source that consistently ranks high in searchresults for queries dealing in the subject matter of the Q&A data store124. The Q&A engine 110 may parse documents associated with anauthoritative source from the crawled documents 120 or the Q&A engine110 may include a web crawler that collects documents and relatedinformation from the authoritative sources. In one implementation, theauthoritative sources may be identified by a domain name, a UniformResource Locator (URL) or a Uniform Resource Identifier (URI). As isknown, all web pages and documents associated with the domain may beconsidered from the authoritative source. Documents from theauthoritative source may be considered authoritative documents.

In some implementations, the Q&A engine 110 may use intent templates 126to populate and maintain the Q&A data store 124 and evaluate queryinformation from the search engine 116. The templates 126 may be derivedfrom content available from authoritative sources and from previouslyprocessed queries and their returned results. Each template 126 mayinclude a non-variable portion and a variable portion. The non-variableportion may be text and the variable portion may be a placeholder forone or more words. For example, a template of “$X causes” has anon-variable portion of “causes” preceded by a variable portion. Asanother example, a template of “recipe for $X” has a non-variableportion of “recipe for” followed by a variable portion. A query orheading that corresponds to or matches the template includes any numberof words followed by the word “causes”, for example “diabetes causes” or“heart attack causes.” The portion that matches the variable portion,for example “diabetes” or “heart attack” for a template of “$X causes”or “split pea soup” for a template of “recipe for $X” may be considereda topic of the query or heading.

Each of the templates 126 may be assigned to a question category thatrepresents a variety of questions used to request the same specificinformation. For example, the templates “how do I treat $X”, “$Xtreatment”, “how is $X treated”, and “how to cure $X” may all betemplates for a treatment question category. Likewise, the templates“how to make $X,” “$X recipe,” and “directions for making $X” may all betemplates for a recipe question category. These questions may beassigned to the question category manually or automatically throughsimilarity of search results returned for queries conforming to thetemplate. For example, if the search results for a query of “how isdiabetes treated” and “what cures diabetes” are similar, the Q&A engine110 may cluster the two templates “how is $X treated” and “what cures$X” together under the treatment question category.

The search system 100 may be in communication with the client(s) 180and/or servers 190 over network 160. Network 160 may be for example, theInternet or the network 160 can be a wired or wireless local areanetwork (LAN), wide area network (WAN), etc., implemented using, forexample, gateway devices, bridges, switches, and/or so forth. Via thenetwork 160, the search engine 116 may communicate with and transmitdata to/from clients 180. For example, search engine 116 may transmitsearch results or suggested updates to one or more of clients 180.

FIG. 2 illustrates an example of a user interface 200 showing enhancedsearch results that include natural language answers, consistent withdisclosed implementations. A search system, such as system 100 of FIG.1, may generate the user interface 200 in response to a query such as“what are the symptoms of mono” or “mononucleosis symptoms.” In theexample of FIG. 2, the enhanced search results may include naturallanguage answers 205, which have been gathered from authoritativesources. In the example of FIG. 2, the natural language answers 205appear ahead of snippet-based search results 250, but they could beinterspersed with the snippet-based search results, to the right or leftof snippet-based search results, in a pop-up window, etc. As illustratedin user interface 200, the natural language answers 205 may include atext portion 210 that allows a query requestor to see text that directlyanswers the query. The natural language answers 205 may also include alink 215 that allows the query requestor to determine the source of thetext portion 210 and to navigate to the source document in theauthoritative source if more information or context is desired. In someimplementations, the natural language answers 205 may be selected as thehighest ranking answers of a set of possible answers. In someimplementations, the ranking algorithm used to rank the snippet-basedsearch results 250 may also be used to rank the natural language results205.

In some implementations, the natural language answers 205 includeanswers from sources or documents that do not appear in thesnippet-based search results 250. For example, if the user issues anatural language query, the snippet-based results 250 may have beenselected from the documents, for example crawled documents 120, based ona keyword level search while the natural language answers 205 are basedon the intent of the natural language query. In some implementations,the search system may use intent templates, described in more detailbelow, to translate the natural language query into a keyword query andthe keyword query may be used to determine the snippet-based results250.

FIG. 3 illustrates a flow diagram of an example process 300 forproviding search results enhanced with natural language answers,consistent with disclosed implementations. The process 300 may beperformed by a search system, such as system 100 of FIG. 1. Once thesearch system has established a Q&A data store, the search system mayperform the steps of process 300 independently of each other. In otherwords, the search system may generate new intent templates independentlyof generating entries in the Q&A data store. The search system may alsorespond to queries using the Q&A data store concurrently with generatingentries in the Q&A data store.

Process 300 may begin with the search system generating intent templates(305). This step may be optional if the search system uses some othermeans of identifying queries with clear-intent questions and identifyinginformation used to populate the Q&A data store. In someimplementations, the search system may perform step 305 periodically,for example once a week or once a month, to determine whether newtemplates can be added. The search system may also generate and maintainthe Q&A data store (310) by parsing documents from the authoritativesources, collecting pairs in the form <heading, text> from the documentcontent, endeavoring to assign a respective topic and question categoryto each of the heading-text pairs, and storing the heading-text pair inthe Q&A data store by topic and question category. In someimplementations, the search system may perform step 310 periodically,for example daily or weekly. In some implementations the period maydepend on the subject matter. For example, medical information may berelatively stable, so that step 310 may be performed less frequently formedical authoritative sources than, for example, cooking authoritativesource which may have content that changes more frequently. The searchsystem may also use the Q&A data store and, in some implementations, theintent templates to provide natural language search results in responseto a query that includes a clear-intent question (315). The searchsystem may perform step 315 as-requested, so that the Q&A data store iscontinuously or nearly-continuously available to respond to queries.

FIG. 4 illustrates a flow diagram of an example process 400 forgenerating intent templates, consistent with disclosed implementations.A search system, such as search system 100 of FIG. 1, may performprocess 400 as part of step 305 of FIG. 3. It is understood that some ofthe steps illustrated in FIG. 4 are optional, and implementations neednot perform each step, or may perform the steps in a different order.

Process 400 may begin with the search system obtaining possible intentquestions from authoritative sources (405). The authoritative sourcesmay be manually identified or may be automatically selected.Authoritative sources may include for example, general sources andfocused sources. As an example, the domains webmd.com, mayoclinic.com,and medicinenet.com may be general authoritative sources for medicalsubject matter and the domains cancer.org and heart.org may be focusedauthoritative sources for medical subject matter. Similarly,allrecipes.com and foodnetwork.com may be general authoritative sourcesfor cooking subject matter and vegetariantimes.com may be a focusedauthoritative source for cooking subject matter. Intent questions may beidentified from headings in the content of documents associated with theauthoritative sources. In web pages, for example, headings may beassumed to include intent questions and the search system may identifyheadings by mark-up language tags, by a larger font size, or some othertype of formatting.

In some implementations, the search system may also obtain potentialintent questions from search records (410). Search records may includesearch logs, aggregated data gathered from queries, or other dataregarding the search terms and search results of previously processedqueries. From the search records the search system may identify queriesthat relate to the subject matter of the Q&A data store. For example,the search system may identify queries having search results thatinclude the authoritative sources in a position of prominence in thesearch results. For example, if the subject matter is medicalinformation the search system may look for query results with documentsfrom mayoclinic.com or webmd.com in the top ranking search results. Thesearch system may then assume that the query associated with suchidentified search results includes a clear-intent question. By lookingfor clear-intent questions from queries as well as from authoritativesources, the search system may account for the various ways an intentquestion can be posed. For example, “heart disease treatment” and “howdo I treat heart disease?” both represent the same intent question, butan authoritative source may be more likely to include the former while aquery may be more likely to include the latter.

The search system may convert the potential intent questions topotential intent templates (415). For example, the search system mayreplace subsets of consecutive terms in the intent question with avariable or placeholder, such as $X. For example, “how diabetes istreated” may yield the potential templates of “how diabetes is treated”,“how $X”, “how diabetes $X”, “how diabetes is $X”, “$X treated”, “$X istreated”, “$X diabetes is treated”, “how $X treated”, “how $X istreated”, and “how diabetes $X treated”. As another example, “How tomake hummus” may yield the potential templates of “how to make hummus”,“how $X”, “how to $X”, “how to make $X”, “how $X make hummus”, “how $Xhummus”, “$X hummus”, “$X make hummus”, etc. Of course, it is understoodthat in some implementations not every possible potential intenttemplate need be generated. For example, the search system may notinclude question words, e.g., who, what, how, when, where, etc., in theconsecutive terms replaced by the placeholder.

The search system may determine the potential templates that occur mostfrequently across the documents and/or the queries associated with theauthoritative sources (420). In one implementation, the search systemmay generate a histogram of the potential templates generated in step415. The search system may select the potential templates appearing mostfrequently as intent templates (425). In some implementations, thesearch system may select a predetermined number of the potentialtemplates. In some implementations, the search system may select alltemplates that occur a predetermined number of times. In someimplementations, the search system may use a combination of a minimumnumber of templates and a minimum number of occurrences.

The search system may then associate each intent template generated witha question category (430). The question category clusters like-templatestogether. Thus, all templates relating to treatment or cure for adisease or condition may be clustered together using the same questioncategory. In some implementations there may be hundreds of questioncategories. In some implementations, the assignment of a questioncategory may be manual. For example, the search system may present theintent templates to a user who selects a question category for eachintent template. In other implementations, the assignment may beautomatic. For example, the search system may issue a query using theintent template, the query replacing the variable portion with a topicrelevant to the subject matter. For instance, if the subject matter ismedical related, the topic may be a disease, drug, or condition name.The search system may issue the queries using the same topic to replacethe variable portion in each intent template. The search system may thencompare the search results returned for each intent template. Templatesthat result in similar search results may be clustered together and theintent templates in the cluster may be assigned a question category. Insome implementations, the search system may use a combination ofautomatic and manual question category assignment, so that templateswith a minimum degree of similarity between search results are assignedthe same question category, and those that fail to reach a minimumdegree of similarity with other intent templates are manually assigned aquestion category by a user. The search system may store the intenttemplates and their respective question categories in a data store. Oncethe search system establishes intent templates, process 400 ends.Because intent templates do not change rapidly, the search system neednot repeat process 400 frequently, but it may be beneficial to repeat itperiodically. In some implementations, the search system may generatethousands of intent templates.

FIG. 5 illustrates a flow diagram of an example process 500 forgenerating a Q&A data store for providing natural language answers toqueries, consistent with disclosed implementations. A search system,such as search system 100 of FIG. 1, may perform process 500 as part ofstep 310 of FIG. 3. Process 500 may begin by parsing documentsassociated with authoritative sources, resulting in the generation ofheading-text pairs (505). For example, the search system may searchdocument contents from the authoritative sources for headings, asdescribed above with regard to step 405 of FIG. 4. When a heading islocated, the heading and the text associated with the heading may becaptured as a heading-text pair. The heading-text pair may include theheading from the document content as a heading portion and, for example,the text appearing in the content of the document after the heading asthe text portion. The text portion may be paragraphs that follow theheading, a list of items that follow the heading, or a combination ofthese. In some implementations, the text that appears after one headingand before another heading may be the text portion of the heading-textpair. In some implementations, a user may manually mark headings and thetext to be associated with the headings. This may be useful forauthoritative content that changes infrequently that does not conform tothe heading-text pair identification described above.

In some implementations, only headings that exhibit a clear intentquestion may be stored in a heading-text pair. For example, the headingsmay be matched against or correspond with intent templates to determinewhether the headings match an intent template. For example, the heading“Cancer symptoms” may correspond with an intent template of “$Xsymptoms” and the heading “Truffle Recipe” may correspond with an intenttemplate of “$X Recipe.” The variable portion of an intent template,e.g. $X, may represent one or more words. Thus, for example, “heartdisease symptoms” also matches the intent template of “$X symptoms” and“Chocolate Cake Recipe” may correspond with an intent template of “$XRecipe.”

The search system may aggregate the heading-text pairs identified in theauthoritative sources by question category (510). In someimplementations, the search system may use intent templates to aggregatethe heading-text pairs. For example, as explained above, the searchsystem may attempt to match a heading portion to an intent template. Ifa match is found, the heading-text pair may be assigned the questioncategory that is assigned to the matching intent template. The searchsystem may aggregate the heading-text pairs by assigned questioncategory. In some implementations, the search system may cluster theheading portions without using intent templates by using otherclustering methods, such as similarity of search results when theheading is used as a query.

The search system may assign a topic to the heading-text pair (515). Thetopic may represent the specific focus of the question or heading. In amedical subject area, the topics may represent various diseases,injuries, drugs, or conditions. In some implementations, the searchsystem may use the intent templates to assign a topic. For example, thevariable portion of an intent template that matches the heading may beused to determine the topic for the heading-text pair. Thus, a headingof “initial symptoms of mono,” which matches an intent template of“symptoms of $X”, may be assigned a topic of ‘mono” and a heading of“pepperoni pizza ingredients,” which corresponds to a template of “$Xingredients”, may be assigned a topic of “pepperoni pizza.”

In some implementations, the topic may not be included in the heading.For example, the heading may simply state “treatment” or “causes”. Sucha heading-text pair may be considered ambiguous. In such a situation,the system may use the context of the heading to determine a topic. Forexample, in some implementations, the dominant terms from a document inwhich the heading appears may be determined and the topic may beselected from the dominant terms. For example, the search system maycompare the dominant terms in a document to the topics associated withother heading-text pairs in the same question category as the ambiguousheading-text pair. If a dominant term corresponds to a topic of otherheading-text pairs with the same question category as the ambiguousheading-text pair, the search system may associate the ambiguousheading-text pair with the matching topic. In some implementations theuniform resource locator of a document may be used to determine thetopic. For example, some authoritative sources use the name of a diseaseas part of the URL. The search system may compare portions of the URLwith topics assigned to other heading-text pairs in the same questioncategory as the ambiguous heading-text pair. If a topic match is found,the system may assign the ambiguous heading-text pair to the matchingtopic. Other methods of using context, such as semantic analysis, may beused to determine a topic for an ambiguous heading-text pair.

The search system may then store the heading-text pair in the Q&A datastore, keyed by the assigned topic and question category (520). The Q&Adata store may store the heading-text pair as text that can be offeredas a natural language answer to a query that includes an intent questionwith the same topic and question category as the heading-text pair. TheQ&A data store may also include other information for the heading-textpair, such as a URL or other identifier of the document from which theheading-text pair was pulled, metadata and other information used torank the heading-text pair, etc. The search system may repeat steps 515and 520 for each heading-text pair that was identified in the documentsassociated with the authoritative sources and assigned a questioncategory. In some implementations, the Q&A data store may include tensof thousands of entries.

FIG. 6 illustrates a flow diagram of an example process 600 for usingthe Q&A data store to provide an answer to a query, consistent withdisclosed implementations. A search system, such as search system 100 ofFIG. 1, may perform process 600 as part of step 315 of FIG. 3. Forexample, the search system may receive a query from a query requestorand may perform process 600 in addition to, or instead of, a processthat searches indexed documents and generates snippet-based searchresults for the query. Process 600 may begin by determining whether thequery includes a clear-intent question (605). When a query can bematched to a topic/question category key in the Q&A data store, itincludes a clear-intent question. In some implementations, determiningwhether the query includes a clear-intent question may involve usingintent templates, as explained in more detail below with regard to FIG.7. In some implementations determining whether the query includes aclear-intent question may involve analyzing the search results for aquery. For example, the search system may compare top-ranked searchresults for the query with top-ranked search results of a query thatincludes the topic and question category of records in the Q&A datastore. For example, the search system may determine that the queryincludes the term “cancer” and that cancer is a topic stored in the Q&Adata store. For each unique question category in the Q&A data storepaired with cancer topic, the search system may issue a query thatincludes the topic and question category. The search results returnedmay be compared to the search results of the query. If the two resultsare similar enough, e.g., meet a similarity threshold, the system maydetermine that the query includes a clear-intent question that matchesthe topic and question category. If the query does not include aclear-intent question (605, No), process 600 ends without providingnatural language answers for the query.

If the query does include a clear-intent question (605, Yes), the searchsystem may retrieve records from the Q&A data store that match thetopic/question category combination and rank the retrieved records(610). In some implementations, the search system may use a rankingmethod that mirrors the ranking method used to rank snippet-based searchresults. This may assure that top sources in the snippet-based searchresults, e.g., results 250 of FIG. 2, appear as top sources in thenatural language results, e.g., natural language results 205 of FIG. 2.In some implementations, the ranking of natural language results, e.g.,the records retrieved from the Q&A data store, may be ranked differentlythan search results from other sources. For example, the search systemmay rank shorter answers ahead of more lengthy answers, may rank answerswith bullet points ahead of paragraph-form answers, may rank answersfrom focused authoritative sources ahead of answers from generalauthoritative sources when the focused source matches the topic of thequery, etc. In some implementations, the Q&A data store records mayinclude an indication of how much of the text is common to other textwith the same topic/question category key. This may be a way for thesearch system to automatically determine which answers include consensusand are, therefore, better answers.

The search system may then select at least one of the ranked recordsretrieved from the Q&A data store to provide as a search result for thequery (615). In some implementations, a predetermined number of thetop-ranking records may be selected. In some implementations the searchresult includes a link to the source document in addition to the naturallanguage text. In some implementations, the search system may removeduplicate documents from the snippet-based search results. For example,if a natural language result is provided for a particular document and asnippet-based result is also provided, the search system may remove thesnippet-based result from the results provided to the query requestor.

In some implementations, the system may not provide natural languageanswers to a query that may otherwise be identified as a clear-intentquery. For example, the query ‘center of disease control and prevention’may match an intent template of ‘$X prevention’ but the intent of thisquery differs from a query of ‘diabetes prevention.’ As another example,the query ‘how to make money’ may match an intent template of ‘how tomake $X’ but the intent differs from a query of ‘how to make pudding.’Thus, a system administrator may include the undesired queries in ablacklist, which can be stored in a memory of the search system. If thequery corresponds to a blacklisted query, the search system may notperform process 600 for the query.

FIG. 7 illustrates an example of a process 700 for determining whether aquery includes an intent question, consistent with disclosedimplementations. Process 700 may be performed by a search system as partof step 605 of FIG. 6. Process 700 may begin with the search systemgenerating potential intent templates from the query (705), as explainedin more detail above with regard to step 415 of FIG. 4. The searchsystem may then determine whether any of the potential intent templatescorrespond to an intent template (710). If not (710, No), the query doesnot include an intent question and process 700 may end. If the querydoes correspond to an intent template (710, Yes), the search system maydetermine whether the query corresponds to any of the topics in the Q&Adata store (715). For example, the portion of the query that maps to avariable portion of the intent template may be assumed to be the topicfor the query. The search system may look at the Q&A data store todetermine whether the combination of this topic and the questioncategory assigned to the matching intent template appear as a key in theQ&A data store. If not (715, No), the query does not include an intentquestion and process 700 ends. If a matching topic is found (715, Yes),the search system may return the matching topic and the questioncategory of the matching intent template, from step 710. The topic andquestion category may then be used to retrieve natural language answersfrom the Q&A data store, as described above with regard to FIG. 6.

In some implementations, process 700 may also be used to convert anatural language query to a keyword query to improve the snippet-basedsearch results returned in response to the query. For example, if thesearch system determines that the natural language query includes aclear-intent question using process 700, the search system may use thecorresponding topic and the question category to issue a new query inplace of the natural language query. For example, the search system mayuse the topic and the question category to search an index of crawleddocuments for documents responsive to the topic and question category.The responsive documents may be used to generate snippet-based searchresults. Thus, process 700 may be used to generate higher-qualityconventional search results for a natural language query, in addition toproviding natural language results.

It is to be understood that while the examples above relate generally totopics and questions in the medical subject area, implementations arenot limited to such applications. The methods, system, and techniquesdescribed above may be applied to any subject area where authoritativesources may be identified.

FIG. 8 shows an example of a generic computer device 800, which may besystem 100, and/or client 180 of FIG. 1, which may be used with thetechniques described here. Computing device 800 is intended to representvarious example forms of computing devices, such as laptops, desktops,workstations, personal digital assistants, cellular telephones, smartphones, tablets, servers, and other computing devices, includingwearable devices. The components shown here, their connections andrelationships, and their functions, are meant to be examples only, andare not meant to limit implementations of the inventions describedand/or claimed in this document.

Computing device 800 includes a processor 802, memory 804, a storagedevice 806, and expansion ports 810 connected via an interface 808. Insome implementations, computing device 800 may include transceiver 846,communication interface 844, and a GPS (Global Positioning System)receiver module 848, among other components, connected via interface808. Device 800 may communicate wirelessly through communicationinterface 844, which may include digital signal processing circuitrywhere necessary. Each of the components 802, 804, 806, 808, 810, 840,844, 846, and 848 may be mounted on a common motherboard or in othermanners as appropriate.

The processor 802 can process instructions for execution within thecomputing device 800, including instructions stored in the memory 804 oron the storage device 806 to display graphical information for a GUI onan external input/output device, such as display 816. Display 816 may bea monitor or a flat touchscreen display. In some implementations,multiple processors and/or multiple buses may be used, as appropriate,along with multiple memories and types of memory. Also, multiplecomputing devices 800 may be connected, with each device providingportions of the necessary operations (e.g., as a server bank, a group ofblade servers, or a multi-processor system).

The memory 804 stores information within the computing device 800. Inone implementation, the memory 804 is a volatile memory unit or units.In another implementation, the memory 804 is a non-volatile memory unitor units. The memory 804 may also be another form of computer-readablemedium, such as a magnetic or optical disk. In some implementations, thememory 804 may include expansion memory provided through an expansioninterface.

The storage device 806 is capable of providing mass storage for thecomputing device 800. In one implementation, the storage device 806 maybe or contain a computer-readable medium, such as a floppy disk device,a hard disk device, an optical disk device, or a tape device, a flashmemory or other similar solid state memory device, or an array ofdevices, including devices in a storage area network or otherconfigurations. A computer program product can be tangibly embodied insuch a computer-readable medium. The computer program product may alsocontain instructions that, when executed, perform one or more methods,such as those described above. The computer- or machine-readable mediumis a storage device such as the memory 804, the storage device 806, ormemory on processor 802.

The interface 808 may be a high speed controller that managesbandwidth-intensive operations for the computing device 800 or a lowspeed controller that manages lower bandwidth-intensive operations, or acombination of such controllers. An external interface 840 may beprovided so as to enable near area communication of device 800 withother devices. In some implementations, controller 808 may be coupled tostorage device 806 and expansion port 814. The expansion port, which mayinclude various communication ports (e.g., USB, Bluetooth, Ethernet,wireless Ethernet) may be coupled to one or more input/output devices,such as a keyboard, a pointing device, a scanner, or a networking devicesuch as a switch or router, e.g., through a network adapter.

The computing device 800 may be implemented in a number of differentforms, as shown in the figure. For example, it may be implemented as astandard server 830, or multiple times in a group of such servers. Itmay also be implemented as part of a rack server system. In addition, itmay be implemented in a personal computer such as a laptop computer 822,or smart phone 836. An entire system may be made up of multiplecomputing devices 800 communicating with each other. Otherconfigurations are possible.

FIG. 9 shows an example of a generic computer device 900, which may besystem 100 of FIG. 1, which may be used with the techniques describedhere. Computing device 900 is intended to represent various exampleforms of large-scale data processing devices, such as servers, bladeservers, datacenters, mainframes, and other large-scale computingdevices. Computing device 900 may be a distributed system havingmultiple processors, possibly including network attached storage nodes,which are interconnected by one or more communication networks. Thecomponents shown here, their connections and relationships, and theirfunctions, are meant to be examples only, and are not meant to limitimplementations of the inventions described and/or claimed in thisdocument.

Distributed computing system 900 may include any number of computingdevices 980. Computing devices 980 may include a server or rack servers,mainframes, etc. communicating over a local or wide-area network,dedicated optical links, modems, bridges, routers, switches, wired orwireless networks, etc.

In some implementations, each computing device may include multipleracks. For example, computing device 980 a includes multiple racks 958a-958 n. Each rack may include one or more processors, such asprocessors 952 a-952 n and 962 a-962 n. The processors may include dataprocessors, network attached storage devices, and other computercontrolled devices. In some implementations, one processor may operateas a master processor and control the scheduling and data distributiontasks. Processors may be interconnected through one or more rackswitches 958, and one or more racks may be connected through switch 978.Switch 978 may handle communications between multiple connectedcomputing devices 900.

Each rack may include memory, such as memory 954 and memory 964, andstorage, such as 956 and 966. Storage 956 and 966 may provide massstorage and may include volatile or non-volatile storage, such asnetwork-attached disks, floppy disks, hard disks, optical disks, tapes,flash memory or other similar solid state memory devices, or an array ofdevices, including devices in a storage area network or otherconfigurations. Storage 956 or 966 may be shared between multipleprocessors, multiple racks, or multiple computing devices and mayinclude a computer-readable medium storing instructions executable byone or more of the processors. Memory 954 and 964 may include, e.g.,volatile memory unit or units, a non-volatile memory unit or units,and/or other forms of computer-readable media, such as a magnetic oroptical disks, flash memory, cache, Random Access Memory (RAM), ReadOnly Memory (ROM), and combinations thereof. Memory, such as memory 954may also be shared between processors 952 a-952 n. Data structures, suchas an index, may be stored, for example, across storage 956 and memory954. Computing device 900 may include other components not shown, suchas controllers, buses, input/output devices, communications modules,etc.

An entire system, such as system 100, may be made up of multiplecomputing devices 900 communicating with each other. For example, device980 a may communicate with devices 980 b, 980 c, and 980 d, and thesemay collectively be known as system 100. As another example, system 100of FIG. 1 may include one or more computing devices 900 as search engine116. Furthermore, some of the computing devices may be locatedgeographically close to each other, and others may be locatedgeographically distant. The layout of system 900 is an example only andthe system may take on other layouts or configurations.

Various implementations can include implementation in one or morecomputer programs that are executable and/or interpretable on aprogrammable system including at least one programmable processor, whichmay be special or general purpose, coupled to receive data andinstructions from, and to transmit data and instructions to, a storagesystem, at least one input device, and at least one output device.

These computer programs (also known as programs, software, softwareapplications or code) include machine instructions for a programmableprocessor, and can be implemented in a high-level procedural and/orobject-oriented programming language, and/or in assembly/machinelanguage. As used herein, the terms “machine-readable medium”“computer-readable medium” refers to any non-transitory computer programproduct, apparatus and/or device (e.g., magnetic discs, optical disks,memory (including Read Access Memory), Programmable Logic Devices(PLDs)) used to provide machine instructions and/or data to aprogrammable processor.

The systems and techniques described here can be implemented in acomputing system that includes a back end component (e.g., as a dataserver), or that includes a middleware component (e.g., an applicationserver), or that includes a front end component (e.g., a client computerhaving a graphical user interface or a Web browser through which a usercan interact with an implementation of the systems and techniquesdescribed here), or any combination of such back end, middleware, orfront end components. The components of the system can be interconnectedby any form or medium of digital data communication (e.g., acommunication network). Examples of communication networks include alocal area network (“LAN”), a wide area network (“WAN”), and theInternet.

The computing system can include clients and servers. A client andserver are generally remote from each other and typically interactthrough a communication network. The relationship of client and serverarises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other.

A number of implementations have been described. Nevertheless, variousmodifications may be made without departing from the spirit and scope ofthe invention. In addition, the logic flows depicted in the figures donot require the particular order shown, or sequential order, to achievedesirable results. In addition, other steps may be provided, or stepsmay be eliminated, from the described flows, and other components may beadded to, or removed from, the described systems. Accordingly, otherimplementations are within the scope of the following claims.

What is claimed is:
 1. A computer system comprising: at least oneprocessor; and memory storing instructions that, when executed by the atleast one processor, cause the computer system to perform operationscomprising: parsing documents from authoritative sources to generate aplurality of heading-text pairs, a heading-text pair being taken fromdocument contents, generating a set of potential templates from theheading-text pairs, determining a quantity of occurrences for at leastsome of the set of potential templates, storing potential templates withhighest quantities as intent templates in a memory of the computersystem; and using the templates to populate a data store of heading-textpairs used to respond to natural language queries.
 2. The computersystem of claim 1, wherein converting the heading to potential templatesincludes replacing subsets of consecutive terms in the heading with avariable portion.
 3. The computer system of claim 1, wherein the set ofpotential templates is a first set of potential templates and the memorystores instructions that, when executed by the at least one processor,cause the computer system to further perform operations including:determining, using search records, previously issued queries that havesearch results associated with the authoritative sources; generating asecond set of potential templates from the determined previously issuedqueries; and including the second set of potential templates with thefirst set of potential templates when determining the quantity ofoccurrences.
 4. The computer system of claim 1, wherein the memorystores instructions that, when executed by the at least one processor,cause the computer system to further perform operations including:assigning a question category to the intent templates, the questioncategory being stored as an attribute of the intent template.
 5. Thecomputer system of claim 4, wherein the memory stores instructions that,when executed by the at least one processor, cause the computer systemto further perform operations including: receiving a natural languagequery; determining an intent template of the intent templates thatcorresponds to the natural language query, the intent template having anassociated question category; determining a topic for the naturallanguage query using the intent template; searching an index ofdocuments for documents responsive to the topic and the associatedquestion category; and providing a search result to for the naturallanguage query that includes the documents responsive to the topic andthe associated question category.
 6. A system comprising: at least oneprocessor; a data store of heading-text pairs keyed by topic andquestion category, the heading-text pairs being extracted from contentof authoritative sources; and memory storing instructions that, whenexecuted by the at least one processor, cause the system to performoperations including: determining that a query corresponds to an intenttemplate of a plurality of intent templates, the intent template havingan associated question category, determining a topic for the query basedon the intent template, retrieving heading-text pairs from the datastore that have a respective topic and question category key thatcorresponds with the topic for the query and the question category ofthe template, and providing a search result for the query, wherein thesearch result includes at least one of the retrieved heading-text pairs.7. The system of claim 6, wherein the intent template includes onenon-variable portion and one variable portion, and wherein correspondingthe query to the intent template includes: determining that the queryincludes a first term that corresponds to the one non-variable portion;determining that a second term in the query aligns with the variableportion; and determining that the second term in the query correspondsto a topic in the data store.
 8. The system of claim 7, whereincorresponding the query to the intent template includes: generatingpotential templates from terms of the query; and determining whether oneof the potential templates corresponds to the intent template.
 9. Thesystem of claim 6, the memory storing instructions that, when executedby the at least one processor, cause the system to perform furtheroperations including: searching a document corpus using the topic forthe query and the question category of the template as a new queryinstead of searching the document corpus with the query; and providingsearch results from the document corpus with the at least one of theretrieved heading-text pairs.
 10. A method comprising: parsing adocument from an authoritative source to generate at least oneheading-text pair, the text appearing under the heading in the document;assigning a topic and a question category to the heading-text pair;storing the heading-text pair in a data store keyed by the topic and thequestion category; determining that a query corresponds to the topic andthe question category; and providing the heading-text pair as a naturallanguage search result for the query.
 11. The method of claim 10,further comprising: retrieve a plurality of heading-text pairs from thedata store, each heading-text pair being keyed by the topic and thequestion category; and rank the plurality of heading-text pairs; andselect a predetermined number of highest ranked heading-text pairs forthe search result.
 12. The method of claim 10, further comprising:generate snippet-based search results by searching an index of documentsfor documents responsive to the query, and provide the snippet-basedsearch results with the natural language search result.
 13. The methodof claim 10, wherein assigning a topic to the heading-text pairincludes: identifying a topic for another heading text pair stored inthe data store that matches a portion of a URL for the document and hasthe same question category as the heading-text pair; and associating theheading-text pair with the topic of the other heading-text pair.
 14. Themethod of claim 10, wherein assigning a topic to the heading-text pairincludes: identifying a topic for another heading-text pair stored inthe data store that corresponds with dominant terms in the document andhas the same question category as the heading-text pair; and associatingthe heading-text pair with the topic of the other heading-text pair. 15.The method of claim 10, wherein determining that the query correspondsto the topic and question category includes: determining that the queryincludes a topic used to key heading-text pairs in the data store; andfor each heading-text pair keyed by the topic in the data store: issue asecond query that includes the topic and question category for theheading-text pair, compare search results for the second query to searchresults obtained from a document corpus for the query, and determinethat the query corresponds to the topic and question category of theheading-text pair when search results for the second query meet asimilarity threshold with regard to the search results for the queryfrom the document corpus.
 16. The method of claim 10, further comprisingmemory storing a plurality of intent templates and wherein theheading-text pair is generated when the heading conforms to one of theplurality of intent templates.
 17. The method of claim 16, wherein thequestion category for the heading-text pair is determined by the intenttemplate the heading conforms to.
 18. The method of claim 10, whereingenerating the heading-text pair includes: determining a topic from acontext of the heading in the document; and adding the topic to aheading portion of the heading-text pair.